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Abstract 

Recovering or estimating the initial state of a high-dimensional system can require a large number of 
measurements. In this paper, we explain how this burden can be significantly reduced for certain linear 
systems when randomized measurement operators are employed. Our work builds upon recent results 
from Compressive Sensing (CS) and sparse signal recovery. In particular, we make the connection to CS 
analysis for random block diagonal matrices. 

By deriving Concentration of Measure (CoM) inequalities, we show that the observability matrix 
satisfies the Restricted Isometry Property (RIP) (a sufficient condition on the measurement matrix for 
stable recovery of sparse vectors) under certain conditions on the state transition matrix. For example, we 
show that if the state transition matrix is unitary, and if independent, randomly-populated measurement 
matrices are employed, then it is possible to uniquely recover a sparse high-dimensional initial state when 
the total number of measurements scales linearly in the sparsity level (the number of non-zero entries) 
of the initial state and logarithmically in the state dimension. This is in fact a significant reduction in 
the sufficient total number of measurement for correct initial state recovery as compared to traditional 
observability theory. We further extend our RIP analysis for scaled unitary and symmetric state transition 
matrices. We support our analysis with a case study of a two-dimensional diffusion process. 

Index Terms 

Observability, Restricted Isometry Property, Concentration of Measure Inequalities, Block Diagonal 
Matrices, Compressive Sensing 

I. Introduction 

In this paper, we consider the problem of recovering the initial state of a high-dimensional system 
from compressive measurements (i.e., we take fewer measurements than the system dimension). 
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in (1). This work was partially supported by AFOSR Grant FA9550-09- 1-0465, NSF Grant CCF-0830320, DARPA Grant 
HROO 11 -08- 1-0078, and NSF Grant CNS-0931748. 
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(1) 



A. Measurement Burdens in Observability Theory 

Consider an A^-dimensional discrete-time linear dynamical system described by the state equatioij^ 

Xk = Axf^—i 
Vk = CkXk, 

where Xk G represents the state vector at time k G {0, 1,2,...}, A G M^^^ represents the state 
transition matrix, G M^^ represents a set of measurements (or "observations") of the state at time k, 
and Cfc G M^^^^ represents the measurement matrix at time k. (Observe that the number of measurements 
at each sample time is M.) For any finite set 17 C {0, 1,2,3,... }, define the generalized observability 
matrix as 



On :-- 



iMKxN 



(2) 



where Q = {ko,ki, . . . ,kK-i} contains K observation times. Note that this definition extends the 
traditional definition of the observability matrix by considering arbitrary time samples in (|2]) and matches 
the traditional definition when 17 = {0,1, . . . , K — I}. The primary use of observability theory is in 
ensuring that a state (say, an initial state xq) can be recovered from a collection of measurements 
• • • In particular, defining 

T 



Vn 



T T 



T 

VkK-i 



we have 



(3) 



Although we will consider situations where Ck changes with each k, we first discuss the classical 
case where Ck = C (C is assumed to have full row rank) for all k and Q, = {0,1,...,^ — 1} (the 
observation times are consecutive). In this setting, an important and classical result [2] states that a 
system described by the state equation ([T]l is observable if and only if Oq has rank N (full column rank) 



'The results of this paper also apply directly to systems described by a state equation of the form 

Xk = Axk-i + Buk 



Vk 



CkXk + Duk, 



where Uk £ is the input vector at sample time k and B G 



and D e 



are constant matrices. Indeed, initial 



state recovery is independent of B and D when it is assumed that the input vector Uk is known for all sample times k. 
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where = {0,1, . . . , N — 1}. One challenge in exploiting this fact is that for some systems, N can be 
quite large. For example, distributed systems evolving on a spatial domain can have a large state space 
even after taking a spatially-discretized approximation. In such settings, we might therefore require a 
very large total number of measurements (MK with = A^) to identify an initial state, and moreover, 
inverting the matrix On could be very computationally demanding. 

This raises an interesting question: under what circumstances might we be able to infer the initial state 
of a system when MK < N7 We might imagine, for example, that the measurement burden could be 
alleviated in cases when there is a model for the state a^o that we wish to recover. Alternatively, we 
may have cases where, rather than needing to recover a^o from y^, we desire only to solve a much 
simpler inference problem such as a binary detection or a classification problem. In this paper, inspired 
by the emerging theory of Compressive Sensing (CS) |j3|-||5|, we explain how such assumptions can 
indeed reduce the measurement burden and, in some cases, even allow recovery of the initial state when 
MK < N and the system of equations Q is guaranteed to be underdetermined. 

B. Compressive Sensing and Randomized Measurements 

The CS theory states that it is possible to solve certain rank-deficient sets of linear equations by 
imposing a model assumption on the signal to be recovered. In particular, suppose y = where <I> 
is an M x matrix with Af < A^. Suppose also that x G is S'-sparse, meaning that only S out 
of its N entries are non-zeroj^ If <I> satisfies a condition called the Restricted Isometry Property (RIP) 
of order 25" for a sufficiently small isometry constant 62s, then it is possible to uniquely recover any 
5-sparse signal x from the measurements y = ^x using a tractable convex optimization program known 
as ^1 -minimization g, g, ||6|. The RIP also ensures that the recovery process is robust to noise and 
stable in cases where x is not precisely sparse |7j. Similar statements can be made for recovery using 
various iterative greedy algorithms ||8|-|11 1. In the following, we provide the definition of the RIP. 



Definition 1: A matrix <i> G ]^MxN (^j^ ^ ^^j^ satisfy the RIP of order S with isometry 

constant 5s G (0, 1) if 

{l-5s)\\x\\l < \\^x\\l < {l + Ss)\\x\\l (4) 

holds for all S'-sparse vectors x G R^. 

Observe that the RIP is a property of a matrix and has a deterministic definition. However, checking 
whether the RIP holds for a given matrix $ is computationally expensive and is almost impossible when 

^This is easily extended to the case wliere x is sparse in some transform domain. 
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N is large. A common way to establish the RIP for <I> is to populate <I> with random entries. If <I> is 
populated with independent and identically distributed (i.i.d.) Gaussian random variables having zero 
mean and variance i, for example, then <I> will satisfy the RIP of order S with isometry constant 63 
with very high probability when M is proportional to 6g'^Slog^. This result is significant because it 
indicates that the number of measurements sufficient for correct recovery scales linearly in the sparsity 
level S and only logarithmically in the ambient dimension N. Other random distributions may also be 
considered, including matrices with uniform entries of random signs. Consequently, a number of new 
sensing hardware architectures, from analog-to-digital converters to digital cameras, are being developed 



to take advantage of the benefits of random measurements |12|-|15|. 



A simple way |16|, |17| of proving the RIP for a randomized construction of ^ involves first showing 
that the matrix satisfies a Concentration of Measure (CoM) inequality akin to the following. 

Definition 2: A random matrix (a matrix whose entries are drawn from a particular probability dis- 
tribution) <I> e M^^^^ is said to satisfy the Concentration of Measure (CoM) inequality if for any fixed 
signal X G (not necessarily sparse) and any e G (0,e), 

p| ||aj||2 > e||a:||2| < 2 exp {-M/(e)} , (5) 

where /(e) is a positive constant that depends on the isometry constant e, and e < 1 is some maximum 
value of the isometry constant for which the CoM inequality holds. 

Note that the failure probability in ^ decays exponentially fast in the number of measurements M 
times some constant / (e) that depends on the isometry constant e. For most interesting random matrices, 
including matrices populated with i.i.d. Gaussian random variables, /(e) is quadratic in e as e — 0. 

Baraniuk et al. 1 16| and Mendelson et al. 1 18 1 showed that a CoM inequality of the form ([5]) can be 
used to prove the RIP for random compressive matrices. This result is rephrased by Davenport p9| as 
follows. 

Lemma 1: /iP'/ Let X denote an 5-dimensional subspace in M^. Let 5$ G (0, 1) denote a distortion 
factor and v G (0,1) denote a failure probability, and suppose <I> is an M x N random matrix that satisfies 
the CoM inequality dSll with 

^^51og(f) + log(^) 



Then with probability at least 1 — v, 

{l-5s)\\x\\l < \\^x\\l < {l + Ss)\\x\\l 

for all X ^ X. 
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Through a union bound argument (see, for example, Theorem 5.2 in |16|) and by applying Lemma [T] 
for all (^) 5-dimensional subspaces that define the space of S'-sparse signals in M^, one can show that 
$ satisfies the RIP (of order S and with isometry constant 6s) with high probability when M scales 
linearly in S and logarithmically in N. 

Aside from connections to the RIP, concentration inequalities such as the above can also be useful 
when solving other types of inference problems from compressive measurements. For example, rather 
than recovering a signal x, one may wish only to solve a binary detection problem and determine whether 
a set of measurements y correspond only to noise (the null hypothesis y = <I>(noise)) or to signal plus 
noise (y = ^{x + noise)). When <I> is random, the performance of a compressive detector (and of other 



multi-signal classifiers) can be studied using concentration inequalities |20|, and in these settings it is 
not necessary to assume that x is sparse. 



C. Observability from Random, Compressive Measurements 

In order to exploit CS concepts in observability analysis, we consider in this paper scenarios where the 
measurement matrices Ck are populated with random entries. Physically, such randomized measurements 
may be taken using the types of CS protocols and hardware mentioned above. Our analysis is therefore 
appropriate in cases where one has some control over the sensing process. 

As is apparent from (|2]), even with randomness in matrices Ck, the observability matrix Oq will contain 
some structure and cannot simply modeled as being populated with i.i.d. Gaussian random variables and 
thus, existing results can not be directly applied. Our work builds on a recent paper by Park et al. in 
which CoM inequalities are derived for random block diagonal matrices |21|. Our concentration results 
cover a large class of systems (not necessarily unitary) and initial states (not necessarily sparse), and 
apart from guaranteeing recovery of spai^se initial states, other inference problems concerning xq (such 
as detection or classification of more general initial states and systems) can also be solved from the 
random, compressive measurements, and the performance of such techniques can be studied using the 
CoM bounds that we provide. 

Whilst our CoM results are general and cover a broad class of systems and initial states, they have 
important implications for establishing the RIP in the case of a sparse initial state xq. Such RIP results 
provide a sufficient number of measurements for exact initial state recovery when the initial state is known 
to be sparse a priori. The results of this paper show that under certain conditions on A (e.g., for unitary, 
scaled unitary, and certain symmetric matrices A), the observability matrix Oq will satisfy the RIP with 
high probability when the total number of measurements MK scales linearly in S and logarithmically in 
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N. Before going into the details of the derivations and proofs, we first state in Section [IT] our main results 



on establishing the RIP for the observability matrix. We then present in Section III the CoM results upon 



which the conclusions for establishing the RIP are based. Finally, in Section IV we support our results 
with a case study involving a diffusion process starting from a sparse initial state. As an example, one 
could imagine a situation where a few drops of poison have been introduced into particular (i.e., sparse) 
locations in a lake of water. From the available measurements at later times, we would like to estimate 
the source of the contamination. 

D. Related Work 

Questions involving observability in compressive measurement settings have also been raised in a 



recent paper p2| concerned with tracking the state of a system from nonlinear observations. Due to 
the intrinsic nature of the problems in that paper, however, the observability issues raised are quite 
different. For example, one argument appears to assume that M > S, a requirement that we do not have. 
In a recent technical report p3| , Dai et al. have also considered a similar sparse initial state recovery 
problem. However, their approach is quite different and the results are only applicable in noiseless and 
perfectly sparse initial state recovery problems. In this paper, we establish the RIP for the observability 
matrix, which implies not only that perfectly sparse initial states can be recovered exactly when the 
measurements are noiseless but also that the recovery process is robust with respect to noise and that 
nearly-sparse initial states can be recovered with high accuracy p|. Finally, we note that a matrix vaguely 
similar to the observability matrix has been studied by Yap et al. in the context of quantifying the memory 



capacity of echo state networks |24|. 



II. Restricted Isometry Property and the Observability Matrix 

When the observability matrix Oq satisfies the RIP of order 25" with isometry constant ^25 < \/2 — 1, 
an initial state with S or fewer non-zero elements can be stably recovered by solving an £1 -minimization 
problem |j7|. (Similar statements can be made for recovery using various iterative greedy algorithms [8|- 



1 11 1.) In this section, we present cases where the total number of measurements sufficient for establishing 
the RIP scales linearly in S and only logarithmically in the state dimension N. As in standard observability 
theory, the state transition matrix A plays a crucial role in the analysis. Because the analysis is somewhat 
complex, results for completely general A are difficult to obtain. However, we present results here for 
unitary, scaled unitary, and certain symmetric matrices, and we believe that these can give interesting 
insight into the essential issues driving the initial state recovery problem. 
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To assist in interpreting the RIP results, let us point out that we actually state our RIP bounds in 
terms of the scaled observability matrix -^Oq where b is defined below and is chosen to ensure that the 
measurements are properly normalized to be compatible with Q. In noiseless recovery, this scaling is 
unimportant. When noise occurs, however, the scaling enters into the effective signal to noise ratio for 
the problem, as the measurements will be more sensitive to noise when b is small. 

Our first result, stated in Theorem [TJ applies to a system with dynamics represented by a scaled unitary 
matrix when measurements are taken at the first K sample times, starting at zero. 

Theorem 1: Assume i7 = {0, 1, . . . , K — 1}. Suppose that A G R^^^ can be represented as ^ = all 

where a G M (a / 0) and G M^^^ is unitary. Define 6 := 1 + + H h a'^^^~^\ Assume each 

of the measurement matrices Ck G M^^^^ is populated with i.i.d. Gaussian random entries with mean 
zero and variance j^. Assume all matrices Ck are generated independently of each other Suppose that 
N, S, and Ss G (0, 1) are given. Then with probability exceeding l — u, '^^n satisfies the RIP of order 
S with isometry constant 6s whenever 

f 512 ((1 - a')K + a^) (5(log(|f ) + 1 + log(f )) + log(^); 



MK> ( 



(^2 
"5 



512 ((1 - a-')K + a-2) (5(log(|f ) + 1 + log(f )) + logQ 



^1 



\a\ < 1 (6) 
\a\ > 1. (7) 



Proof See Section IIII-A2I ■ 
In the case of unitary A, we can relax the consecutive sample times assumption in Theorem [T] (i.e., 
Q = {0,1, . . . , K — 1}). We have the following RIP result when K arbitrarily-chosen samples are taken. 

Theorem 2: Assume = {kQ,k\, . . . , kK~\\- Suppose that A G M^^^ is unitary. Assume each of 
the measurement matrices G 'M}'^^^ is populated with i.i.d. Gaussian random entries with mean zero 
and variance jj. Assume all matrices are generated independently of each other. Suppose that A^, S, 
and bs G (0, 1) are given. Then with probability exceeding \ — v, -^Oq satisfies the RIP of order S 
with isometry constant 6s whenever 

512 (5(log(g) + l + log(f)) + log(a) 
1 



MK> ^ (8) 



-'5 



Proof See Section IIII-A2I ■ 
One should note that when A = all (a 7^ 1), the results of Theorem [T] have a dependency on K (the 
number of sampling times). This dependency is not desired in general. When a = 1 (i.e., A is unitary). 
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a result (Theorem |2]) can be obtained in which the total number of measurements MK scales linearly in 
S and with no dependency on K. Our general results for A = all also indicate that when \a\ is close to 
the origin (i.e., \a\ <^ 1), and by symmetry when \a\ » 1, worse recovery performance is expected as 
compared to the case when a = 1. When \a\ <^ 1, as an example, the effect of the initial state will be 
highly attenuated as we take measurements at later times. A similar intuition holds when \a\ » 1. As 
mentioned earlier, when A is unitary we can further relax the K consecutive sample times requirement 
and instead take K arbitrarily-chosen samples. 

Theorem 2j in particular, states that under the assumed conditions, -^Oq satisfies the RIP of order 
S with isometry constant 6s with high probability when the total number of measurements MK scales 
linearly in the sparsity level S and logarithmically in the state ambient dimension N. Consequently 
under these assumptions, unique recovery of any S'-sparse initial state xq is possible from yQ = OqXq 
by solving the £i -minimization problem or using various iterative greedy algorithms |[8|, pO] | whenever 
MK is proportional to 51og(^). This is in fact a significant reduction in the sufficient total number of 
measurements for correct initial state recovery as compared to traditional observability theory. 

We further extend our analysis and establish the RIP for certain symmetric matrices A. We believe 
this analysis has important consequences in analyzing problems of practical interest such as diffusion 
(see, for example, Section IV). Motivated by such an application, in particular, suppose that A G M^^^ 
is a positive semidefinite matrix with the eigendecomposition 



A = UAU^ = [Ui\U2\ 



[Ui\U2f, (9) 



Ai 
A2 

where U G M^^^ is unitary, A G M^^^ is a diagonal matrix with non-negative entries, Ui G R^^^, 
U2 G M^x(^--^), Ai G R^""^, and A2 G ]^{N-L)>ciN-L) _ -p^e submatrix Ai contains the L largest 
eigenvalues of A. The value for L can be chosen as desired; our results below give the strongest bounds 
when all eigenvalues in Ai are large compared to all eigenvalues in A2. Let Ai denote the smallest 
entry of Ai, Ai^max denote the largest entry of Ai, and A2,max denote the largest entry of A2. 

In the following, we show that in the special case where the matrix Uf G M^^^ {L < N) happens 
to itself satisfy the RIP (up to a scaUng), then Oq satisfies the RIP (up to a scaling). Although there 
are many state transition matrices A that do not have a collection of eigenvectors Ui with this special 
property, we do note that if A is a circulant matrix, its eigenvectors will be the Discrete Fourier Transform 
(DFT) basis vectors, and it is known that a randomly selected set of DPT basis vectors will satisfy the 
RIP with high probability |25|. Other cases where Ui could be modeled as being randomly generated 
could also fit into this scenario, though such cases may primarily be of academic interest. 
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Theorem 3: Assume 17 = {/cq, /ci, . . . , kx-i}- Assume A has the eigendecomposition given in ^ and 
Uf G M^^^ (L < A^) satisfies a scaled versioij^ of the RIP of order S with isometry constant 5s- 
Formally, assume for 5$ € (0,1) that 

{l-5s)^\\xo\\l < WUjxoWl < (l + <55)^||a;o||2 (10) 

holds for all S'-sparse xq G R^. Assume each of the measurement matrices Ck € M^^^^ is populated 
with i.i.d. Gaussian random entries with mean zero and variance j^. Assume all matrices are generated 
independently of each other. Let v G (0, 1) denote a failure probability and 5 G (0, ■^=) denote a distortion 
factor. Then with probability exceeding 1 — v, 

(l-^)f(l-^5)|E^?inl <^^<(l + '^)f(l+'^.)|E (11) 

for all 5-sparse xq G whenever 

> 512K(^Mf) + l + log(f))+log(a)) 

p6^ ' ^ ^ 



where 



and 



p:= inf T{AnXo) (13) 

S- sparse aJoSR™ 



\\A''°xo\\l + WA^'xqWI + ■■■ + P^^-^aolli) 



Proof See Appendix A. ■ 
The result of Theorem [3] is particularly interesting in applications where the largest eigenvalues of A 
all cluster around each other and the rest of the eigenvalues cluster around zero. Put formally, we are 
interested in applications where 

« A2,max « ^ « 1. 

^l,max 

The following corollary of Theorem [3] considers an extreme case when Ai,max = Ai,min and A2,max = 0. 

Corollary 1: Assume Q = {kQ,ki, . . . ,kK^i}- Assume each of the measurement matrices Ck G 
j^AfxAf populated with i.i.d. Gaussian random entries with mean zero and variance jj. Assume all 
matrices Ck are generated independently of each other. Suppose A has the eigendecomposition given 
in (M) and Ui G M^^^ (L < A^) satisfies a scaled version of the RIP of order S with isometry constant 



^The ^ scaling in 



10 1 is to account for the unit-norm rows of Ui 
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6s as given in (10 1. Assume Ai^max = Ai^min = A (A 7^ 0) and A2,max = 0. Let u G (0, 1) denote a failure 
probability and 5 G (0, 1) denote a distortion factor. Define C := ^^'^ 6'^ := 6s + S + 6s6. 



Then with probability exceeding 1 — u, 

{I - 6's)\\xo\\l < 



N 
LC 



OnxoWl < + S's)\\xo\\l 



for all S'-sparse xq G whenever 

f 512(1 + ^g)2A~4(fc.-:-fco) (log(^) + 1 + log(|)) + log(l)) 

(1 - Ssr6^ 

512(1 + ^g)2A4(fe^-.-fco) (g (log(f ) + 1 + log(f )) + log(^)) 

(1 - 6sy6^ 



MK> I 



(14) 

A < 1 (15) 
A > 1. (16) 



Proof See Appendix B. ■ 
While the result of Corollary [T] is generic and valid for any A, an important RIP result can be obtained 

when A = 1. The following corollary states the result. 

Corollary 2: Suppose the same notation and assumptions as in Corollary [T] and additionally assume 

A = 1. Then with probability exceeding 1 — u, 



N 



(1 - 6's)\\xo\\l < Wxljj^Onx^l < (1 + 6's)\\x,\\l 



for all 5-sparse xq G R whenever 



MK > 



512(1 + 6s? [S (log(f ) + 1 + log(f )) + log(^)) 
(1 - 5sY6^ 



(17) 



(18) 



Proof See Appendix B. ■ 
These results essentially indicate that the more A deviates from one, the more total measurements 
MK are required to ensure unique recovery of any S'-sparse initial state xq. The bounds on p (which 
we state in Appendix B to derive Corollaries [T] and |2] from Theorem |3]l also indicate that when A 7^ 1, 
the smallest number of measurements are required when the sample times are consecutive (i.e., when 
kx-i — ko = K). Similar to what we mentioned earlier in our analysis for a scaled unitary A, when 
A 7^ 1 the effect of the initial state will be highly attenuated as we take measurements at later times (i.e., 
when kK-i — ko > K) which results in a larger total number of measurements MK sufficient for exact 
recovery. 
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III. Concentration of Measure Inequalities and the Observability Matrix 

In this section, we derive CoM inequalities for the observabiUty matrix when the measurement matrices 
Ck are populated with i.i.d. Gaussian random entries. These inequalities are the foundation for establishing 
the RIP presented in the previous section, via Lemma [1] However, they are also of independent interest 
for other types of problems involving the states of dynamical systems, such as detection and classification 



1 20 1, p6[, |27|. As mentioned earlier, we make a connection to the analysis for block diagonal matrices 



from Compressive Sensing (CS). To begin, note that when = {k^, ki, . . . , k^-i} we can write 



On 



(19) 



where M := MK and N := NK. In this decomposition, is a block diagonal matrix whose diagonal 
blocks are the measurements matrices, C^. We derive CoM inequalities for two cases. We first consider 
the case where all measurement matrices Ck are generated independently of each other. We then consider 
the case where all measurement matrices Ck are the same. 



A. Independent Random Measurement Matrices 

In this section, we assume all matrices Ck are generated independently of each other. Focusing just 
on Cf7 for the moment, we have the following bound on its concentration behavior]^ 

Theorem 4: [21] Assume each of the measurement matrices Ck G M.'^'^^^ is populated with i.i.d. 



Gaussian random entries with mean zero and variance j^. Assume all matrices Ck are generated inde- 



pendently of each other. Let Vkg,Vk, , ■ ■ ■ , Vk^-i £ 



V 



and define 

1 T 



V 



kK-i 



G 



KN 



Then 



\\Cnv\\ 



> e\\v\\2 > < 



2exp{- 



2 exp{ - 



256||7ii2 
Afe||7 
'l6||7|| 



< e < 



e > 



16||7Hi 

II7II00II7I11 

16||7ll^ 
Il7i|oo||7l|i ' 



(20) 
(21) 



All results in Section 
variables, as in 1211. 



iii-a 



may be extended to the case where the matrices Ck are populated with sub-Gaussian random 
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where 



7 = 7(1?) : = 



l^feolli 



\-"kK-i\\2 



As we will be frequently concerned with applications where e is small, consider the first of the cases 



given in the right-hand side of the above bound. (It can be shown |21 1 that this case always permits any 



value of e between and 4|;.) Define 

V K 



r = r(^) 



ItWIII 



l^fcoll2 + ll^fcill2 + --- + ll^fcK-ill2)^ 

\\'"ko\\i + \\'"kAi + --- + \\vk^_,\\i 



(22) 



and note that for any v E M^^, 1 <T {v) < K. This simply follows from the standard relation that 
||z||2 < ll^lli < VK\\z\\2 for all z € M^^. The case r {v) = K is quite favorable because the failure 
probability will decay exponentially fast in the total number of measurements MK. A simple comparison 
between this result and the CoM inequality for a dense Gaussian matrix stated in Definition |2] reveals 
that we get the same degree of concentration from the MK x NK block diagonal matrix Cq as we 
would get from a dense MK x NK matrix populated with i.i.d. Gaussian random variables. This event 
happens if and only if the components have equal energy, i.e., if and only if 



\Vko\\2 = \\VkA\2 



l^'fcK-ill2- 



On the other hand, the case F (i;) = 1 is quite unfavorable and implies that we get the same degree of 
concentration from the MK x NK block diagonal matrix Cq, as we would get from a dense Gaussian ma- 
trix having size only M x NK. This event happens if and only if \\vk^ II2 = for alH G {0, 1, . . . , K — 1} 
but one i. Thus, more uniformity in the values of the \\vk^ ||2 ensures a higher probability of concentration. 
We now note that, when applying the observability matrix to an initial state, we will have 

This leads us to the following corollary of Theorem |4] 

Corollary 3: Suppose the same notation and assumptions as in Theorem |4] Then for any fixed initial 



state xq G and for any e G (0, ^), 



16^ 



||Cna;o|li- Poa;o||i 



> ePnaiolli ^ < 2exp 



MT (Anxo) 
256 



(23) 
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There are two important phenomena to consider in this result, and both are impacted by the interaction 

|2 
I2 



of A with xq. First, on the left-hand side of (23), we see that the point of concentration of HOnaJoHl is 



around H^na^olli' where 

Pna^olli = P^-'a^olli + WA'^'xoWl + ■■■ + \\A'''<-^xo\\1 (24) 

For a concentration bound of the same form as Definition |2| however, ||C)na;o||2 should concentrate 
around some constant multiple of ||a;o||2- In general, for different initial states xq and transition matrices 
A, we may see widely varying ratios J^4^^^w^. However, further analysis is possible in scenarios where 

II "^O II 2 



this ratio is predictable and fixed. Second, on the right-hand side of ( [23] ), we see that the exponent of 
the concentration failure probability scales with 

{\\A'"xo\\l + \\A^^xo\\l + -.. + \\A''^-^xo\\lY 
[^nxo) WA^oxoU + WAk^xoWl + • • • + P'=--:ro|l! ' ^ ^ 

As mentioned earlier, 1 < F {AqXq) < K. The case F {Aq,xq) = K is, quite favorable and happens 
when ll^'^oajpllg = ||j4^ia;o||2 = ••• = || A^^-^aiolb; this occurs when the state "energy" is preserved 
over time. The case F {Aq,xq) = 1 is quite unfavorable and happens when /cq = and xq G null(j4) for 

1 ) Unitary and Scaled Unitary System Matrices: In the special case where A is unitary (i.e., lljd'^'ajUl = 
||x||| for all X G and for any power ki), we can draw a particularly strong conclusion. Because a 
unitary A guarantees both that H^oaJoHi = -f'^lla^olli ^"d that F (AqXo) = K, we have the following 
result. 

Corollary 4: Suppose the same notation and assumptions as in Theorem|4] Assume Vl = {/cq, /ci, . . . , kx- 
Suppose that A is a unitary operator. Then for any fixed initial state a^o G and for any e G (0, l)j^ 



OnaJolli - llaJolli 



K 



>e||a:o||i[> <2exp<;-^^^j>. (26) 



MKe'^ 



What this means is that we get the same degree of concentration from the MK x matrix -^Oq 
as we would get from a fully dense MK x A^ matrix populated with i.i.d. Gaussian random variables. 

^^The observant reader may note that Corollary jsj requires e to be less than This restriction on e appears so that we can 
focus on the upper CoM inequality \2Q) and ignore the lower one \2\) . However, for most of the problems considered in this 
paper (i.e., unitary, scaled unitary, and certain symmetric matrices A), we can actually apply \2Q\ for a much broader range of 
e (up to 1 and even higher). In fact, we can show that in these settings, 

^'^P^ 256||7||i ^ - 16||7l|oo^ 
for all e G (0, 1). Consequently, we allow e G (0, 1) in Corollaries |4j and jsj We have omitted these details for the sake of space. 
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Observe that this concentration result is vaUd for any xo G (not necessarily sparse) and can be used, 
for example, to prove that finite point clouds p8[ and low-dimensional manifolds |29| in M.^ can have 



stable, approximate distance-preserving embeddings under the matrix -^Oq. In each of these cases we 
may be able to solve very powerful signal inference and recovery problems with MK <^ N. 

When Q = {0,1, . . . , K — 1} (consecutive sample times), one can further derive CoM inequalities 



when ^ is a scaled unitary matrix (i.e., when A = all where a G M (a 7^ 1) and U G 



pNxN 



is unitary). 



Corollary 5: Suppose the same notation and assumptions as in Theorem|4] Assume Q = {0,1, . . . , K — 1}. 



Suppose that A = aC/ (a G M, a / 0) and [/ G M^^^ 
fixed initial state xq G and for any e G (0, 1), 



1 



\^o\\l 



> 4M2 > < S 



2 exp 
2 exp 



is unitary. Define b := S^^^^a^^. 



256 ((1 - a2) /s: + a2 

MKe^ 
256((1 -a-2)K + a-2 



Then for any 

|a| < 1 (27) 
\a\ > 1.(28) 



Proof of Corollary |5] First note that when A = all {U is unitary) and U = {0,1, . . . , K — 1} then 

Mna^olli = (1 + H h a2('^-i))||a;o||2 = ^lla^olll- From ^ when \a\ < 1, 



1 + a2 H h a2('^-i))||a;o||2 = ^lla^olll- From ^ when \a\ < 1, 

r r , ^ , - (1 + «^ + • • • + «^^^'^^)' - - "^^) (1 + _ 1^ 

J- {-^nXQ) 1 , „4 , , A(K-l) (1 I „2K\ t-l _ „2\ l-a 



1 + a4 + . . . + a4(^-i) (1 + a2^) (l-a 
Also observ^that when \a\ < 1, 



j1 

,2Jf 



1 — , 9n 



^2K 



< (l-«') + 



K' 



l-a2 

1 + a2 > 1 + a^^^ when \a\ < 1, 



(29) 



(30) 



Thus, from (29 1 and (30 1 and noting that 
Similarly, one can show that when \a\ > 1, 



1 — a 2 a 2 
^<(l-a-') + ^ 



1 - a- 



(31) 



and consequently. 



r (Anxo) 



> 



K 



(1 -a-2)i^ + a-2 



With the appropriate scaling of Oq, by , the CoM inequalities follow from Corollary jsj ■ 

''In order to prove jsojl, for a given |aj < 1, let C (a) be a constant such that for all K (K only takes positive integer values), 
1 < 1+ By this assumption, C (a) > ^^^ik ~'- 9 (c^i K). Observe that for a given \a\ < 1, g (a, K) is a decreasing 



n2K 



function of K and its maximum is achieved when K = 1. Choosing C (a) = g (a, 1) = jzr^ completes the proof of |30| 
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2 ) Implications for the RIP: As mentioned earlier, our CoM inequalities have immediate implications 
in establishing the RIP for the observability matrix. Based on Definition [2] and Lemma [1] in this section 
we prove Theorems [T] and |2] 

Proof of Theorem [l] In order to establish the RIP based on Lemma [TJ we simply need to evaluate 
/(e) in our CoM result derived in Corollary [S] One can easily verify that 



fie) 



256((l-a2)i^ + a2) 

^2 



\a\ < 1 
\a\ > 1. 



(32) 
(33) 



256((l-a-2)K + a-2)' 

Through a union bound argument and by applying Lemma [I] for all (^) S-dimensional subspaces in 
M^, the RIP result follows. ■ 
Proof of Theorem |2] In order to establish the RIP based on Lemma [TJ we simply need to evaluate 
/(e) in our CoM result derived in Corollary |4[ In this case, 

fie 



e 

256' 



Through a union bound argument and by applying Lemma [l] for all (^) S-dimensional subspaces in 
M^, the RIP result follows. ■ 



B. Identical Random Measurement Matrices 

In this section, we consider the case where all matrices Ck are identical and equal to some M x N 
matrix C which is populated with i.i.d. Gaussian entries having zero mean and variance 
again note that we can write Oq = CqAq, where this time 



Once 



■- 



c 



c 



G 



(34) 



and Aq, is as defined in (19 1. The matrix is block diagonal with equal blocks on its main diagonal, 
and we have the following bound on its concentration behavior. 

^Afxiv populated with i.i.d. 



Theorem 5: [21] Assume each of the measurement matrices Ck G 
Gaussian random entries with mean zero and variance jj. Assume all matrices Cfc are the same (i.e., 
Ck = C,yk). Let Vk,„Vk,,. . ■,Vk^_, € and define 

1 T 



V 



T T 
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Then, 



\\Cnv\\l 



Ml 



> e\\v\ 



< e < 



< 



r, f Me^\\\\\l ^ 
^^^Pl 256||A|1^ J' 



l|A|| 

16||A||^ 
l|A||oo||A|| 



ll-^lli 



where 



X = Xiv) :-- 



A 



Ai 
A2 

mm{K,N) 



nmm(K,N) 



(35) 



and {Ai, A2, . . . , Xram{K,N)} the first (non-zero) eigenvalues of the, K x K matrix V'^V, where 



Consider the first of the cases given in the right-hand side of the above bound. (This case permits any 
value of e between and , =.) Define 

^iain{K,N) 

and note that for any v G R^^, 1 < A {v) < m.in{K, N). Moving forward, we will assume for simplicity 
that K < N, although this assumption can be removed. The case A{v) = K is quite favorable and implies 
that we get the same degree of concentration from the MK x NK block diagonal matrix Cq as we would 
get from a dense MK x NK matrix populated with i.i.d. Gaussian random variables. This event happens 
if and only if Ai = A2 = • • • = Xk, which happens if and only if 

lli'fcolb = W^kih = • • • = ll'ffc^c^ilb 

and {vk^,Vki) = for all < £ < — 1 with i ^ I. On the other hand, the case A {v) = 1 is quite 
unfavorable and implies that we get the same degree of concentration from the MK x NK block diagonal 
matrix Cq as we would get from a dense Gaussian matrix having only M rows. This event happens if 
and only if the dimension of spanjirfc^, i;^^, . . . , 1?^^ equals 1. Thus, comparing to Section 



III-A 



uniformity in the norms of the vectors is no longer sufficient for a high probabiUty of concentration; 
in addition to this we must have diversity in the directions of the v^^. 

The following corollary of Theorem |5] derives a CoM inequality for the observability matrix. Recall 
that OqXq = CqAqXq where Cq is a block diagonal matrix whose diagonal blocks are repeated. 
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Corollary 6: Suppose the same notation and assumptions as in Theorem |5] and suppose K < N. Then 



for any fixed initial state xq G and for any e G (0, 



0^xq\\2 - \\AnXQ 



> e||^f7a;o||2 > < 2exp 



256 



(37) 



Once again, there are two important phenomena to consider in this result, and both are impacted by 



the interaction of A with xq. First, on the left hand side of (37), we see that the point of concentration 



of llOna^olll is around H^f^ajoHl- Second, on the right-hand side of (37 1, we see that the exponent of 
the concentration failure probability scales with A{AnXQ), which is determined by the eigenvalues of 
the K X K Gram matrix V'^V, where 



V 



A''"xo A^'xq 



\kK- 



^Xq 



oNxK 



As mentioned earlier, 1 < A{AnXQ) < K. The case K{Aq,xq) = K quite favorable and happens 
when P''«a;o||2 = P'^iccolb = ••• = P^^^-i^olb and {A^^xq^A'^'xq) = for all < < 
K — 1 with i ^ I. The case K{Aq,xq) = 1 is quite unfavorable and happens if the dimension of 
s^?m{A^''xo,A^^XQ, . . . ,A^''-^xq] equals 1. 

In the special case where A is unitary, we know that H^f^aiolli = ^11 II 2- However, a unitary system 
matrix does not guarantee a favorable value for K{Aq,xq). Indeed, if yl = InxN we obtain the worst 
case value A{AnXQ) = 1. If, on the other hand, A acts as a rotation that takes a state into an orthogonal 
subspace, we will have a stronger result. 

Corollary 7: Suppose the same notation and assumptions as in Theorem |5] and suppose K < N. 
Suppose that A is a unitary operator. Suppose also that {A^^xq, A^'^xq) = for all < i, ^ < ii' — 1 
with i ^ L Then for any fixed initial state xq € and for any e G (0, 

1 



Oq.Xq\ 



> e||a;o||2 } < 2exp 



MKe^ 
256 



(38) 



This result requires a particular relationship between A and xq, namely that {A^'Xq, A'^'xq) = for 
all < i,£ < K — 1 with i ^ I. Thus, given a particular system matrix A, it is possible that it might 
hold for some xq and not others. One must therefore be cautious in using this concentration result for 
CS applications (such as proving the RIP) that involve applying the concentration bound to a prescribed 
collection of vectors |16|; one must ensure that the "orthogonal rotation" property holds for each vector 
in the prescribed set. 
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IV. Case Study: Estimating the Initial State in a Diffusion Process 

So far we have provided theorems that provide a sufficient number of measurements for stable recovery 
of a sparse initial state under certain conditions on the state transition matrix and under the assumption 
that the measurement matrices are independent and populated with random entries. In this section, we 
use a case study to illustrate some of the phenomena raised in the previous sections. 



A. System Model 

We consider the problem of estimating the initial state of a system governed by the diffusion equation 



dx 



V ■ iD{p)Vx{p,t)) , 



where x{p, t) is the concentration, or density, at position p at time t, and D{p) is the diffusion coefficient 
at position p. If D is independent of position, then this simpUfies to 

dx 



dt 



DV^x{p, t). 



The boundary conditions can vary according to the surroundings of the domain 11. If 11 is bounded 

by an impermeable surface (e.g., a lake surrounded by the shore), then the boundary conditions are 



n{p) 



dx 
dp 



pedU 



0, where n{p) is the normal to dU. at p. We will work with an approximate model 



discretized in time and in space. For simplicity, we explain a one-dimensional (one spatial dimension) 

of discretization can be i 
p{l) p{2) ••• p{N) 



diffusion process here but a similar approach of discretization can be taken for a diffusion process with 

r -\T 

be a vector of equally-spaced 

x{p{l),t) x{p{2),t) 



two or three spatial dimensions. Let p := 
locations with spacing A^, and let x{p,t) := 
difference approximation in space gives the model 



xip{N),t) 



Then a first 



X {p, t) = Gx (jp, t) , 
where G represents the discrete Laplacian. We have 



(39) 



G 



D 

a! 



-1 


1 





•• 


• 


1 


-2 


1 


•• 


• 





1 


-2 


1 ■• 


• 



••• 1-1 
where F is the Laplacian matrix associated with a path (one spatial dimension). This discrete Laplacian 
G has eigenvalues Aj = (l - cos {^{i - 1))) for i = 1, 2, . . . , N. 
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Time (s) Position (index) 



Fig. 1: One-dimensional diffusion process. At time zero, the concentration (the state) is non-zero only at 
a few locations of the path graph of = 100 nodes. 



To obtain a discrete-time model, we choose a sampling time T, and let the vector = x{p, kTg) be the 
concentration at positions p(l),p(2), . . . ,p{N) at sampling time k. Using a first difference approximation 
in time, we have 

Xf^ = Axf^^i, 

where A = In + GTg. For a diffusion process with two spatial dimensions, a similar analysis would 
follow, except one would use the Laplacian matrix of a grid (instead of the Laplacian matrix of a one- 
dimensional path) in A = I^ + GTs- For all simulations in this section we take D = 1, = 1, = 100, 
and Ts = 0.1. An example simulation of a one-dimensional diffusion is shown in Figure [T] where we 
have initialized the system with a sparse initial state xq containing unit impulses at 5 = 10 randomly 
chosen locations. 



In Section IV-C we provide several simulations which demonstrate that recovery of a sparse initial 



state is possible from compressive measurements. 

B. Diffusion and its Connections to Theorem |3] 

Before presenting the recovery results from compressive measurements, we would like to mention 
that our analysis in Theorem [3] gives some insight into (but is not precisely applicable to) the diffusion 
problem. In particular, the discrete Laplacian matrix G and the corresponding state transition matrix A 
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(see below) are almost circulant, and so their eigenvectors will closely resemble the DFT basis vectors. 
The largest eigenvalues correspond to the lowest frequencies, and so the Ui matrix corresponding to 
G or ^ will resemble a basis of the lowest frequency DFT vectors. While such a matrix does not 
technically satisfy the RIP, matrices formed from random sets of DFT vectors do satisfy the RIP with 
high probability | [25| . Thus, even though we cannot apply Theorem [3] directly to the diffusion problem, 
it does provide some intuition that sparse recovery should be possible in the diffusion setting. 



C. State Recovery From Compressive Measurements 

In this section, we consider a two-dimensional diffusion process. As mentioned earlier, the state 
transition matrix A associated with this process is of the form A = 1^ + GTs, where Tg is the sampling 
time and G is the Laplacian matrix of a grid. In these simulations, we consider a grid of size 10 x 10 
with Ts = 0.1. 

We also consider two types of measuring processes. We first look at random measurement matrices 
Ck G ]kA^><^ where the entries of each matrix are i.i.d. Gaussian random variables with mean zero and 
variance jj. Note that this type of measurement matrix falls within the assumptions of our theorems 
in Sections [IT] and III-A In this measuring scenario, all of the nodes of the grid (i.e., all of the states) 



will be measured at each sample time. Formally, at each observation time we record a random linear 
combination of all nodes. In the following, we refer to such measurements as "Dense Measurements." 



Figure 2(a) illustrates an example of how the random weights are spread over the grid. The weights (the 
entries of each row of each Ck) are shown using grayscale. The darker the node color, the higher the 
corresponding weight. We also consider a more practical measuring process in which at each sample 
time the operator measures the nodes of the grid occurring along a line with random slope and random 
intercept. Formally, Ck = exp ^_£^!iZl^ where dk is the perpendicular distance of node j 
(j = I,. . . ,N) to the ith (i = 1, . . . ,M) line with random slope and random intercept and c is an 
absolute constant that determines how fast the node weights decrease as their distances increase from 



the line. Figure 2(b) illustrates an example of how the weights are spread over the grid in this scenario. 
Observe that the nodes that are closer to the line are darker, indicating higher weights for those nodes. 
We refer to such measurements as "Line Measurements." 

To address the problem of recovering the initial state xq, let us first consider the situation where we 
collect measurements only of xq € M.^^^ itself. We fix the sparsity level of xq to S = 9. For various 
values of M, we construct measurement matrices Co according to the two models explained above. At 
each trial, we collect the measurements i/q = CqXq and attempt to recover xq given and Co using 
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(a) (b) 



Fig. 2: Dense Measurements versus Line Measurements. The color of a node indicates the corresponding 
weight of that node. The darker the node color, the higher the weight. These weights are the entries of 
each row of each C^. (a) Dense Measurements. The weights are drawn from a Gaussian distribution 
with mean zero and variance jj. These values are random and change for each measurement, (b) Line 
Measurements. The weights are generated as a function of the perpendicular distances of all nodes of the 
grid to the line. The slope and the intercept of the line are random and change for each measurement. 



the canonical £i -minimization problem from CS: 

xq = arg min ||a;||i subject to yi, = CkA^x (40) 

with A; = 0. (In the next paragraph, we repeat this experiment for different k.) In order to imitate what 
might happen in reality (e.g., a drop of poison being introduced to a lake of water at A; = 0), we assume 
the initial contaminant appears in a cluster of nodes on the associated diffusion grid. In our simulations, 
we assume the = 9 non-zero entries of the initial state correspond to a 3 x 3 square-neighborhood of 
nodes on the grid. For each M, we repeat the recovery problem for 300 trials; in each trial we generate 
a random sparse initial state xq (an initial state with a random location of the 3x3 square and random 
values of the 9 non-zero entries) and a measurement matrix Co as explained above. 

Figure 3(a)| depicts, as a function of M, the percent of trials (with xq and Cq randomly chosen in 
each trial) in which the initial state is recovered perfectly, i.e., xq = xq. Naturally, we see that as we 
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20 40 60 80 100 20 40 60 80 100 



Measurements (M) Measurements (M) 

(a) (b) 

Fig. 3: Signal recovery from compressive measurements of a diffusion process which has initiated from 
a sparse initial state of dimension = 100 and sparsity level S = 9. The plots show the percent 
of trials (out of 300 trials in total) with perfect recovery of the initial state xq versus the number of 
measurements M. (a) Recovery from compressive measurements at time A; = 0. (b) Recovery from 
compressive measurements at time A; = 10. 



take more measurements, the recovery rate increases. When Line Measurements are taken, with almost 
35 measurements we recover every sparse initial state of dimension 100 with sparsity level 9. When 
Dense Measurements are employed, however, we observe a slightly weaker recovery performance at 
= as almost 45 measurements are required to see exact recovery. In order to see how the diffusion 
phenomenon affects the recovery, we repeat the same experiment at A; = 10. In other words, we collect 
the measurements = CiqXiq = CiqA^^xq and attempt to recover xq given and CiqA^^ using the 



canonical -minimization problem (40 1. As shown in Fig. 3(b) the recovery performance is improved 
when Line and Dense Measurements are employed (with almost 25 measurements exact recovery is 
possible). Qualitatively, this suggests that due to diffusion, at k = 10, the initial contaminant is now 
propagating and consequently a larger surface of the lake (corresponding to more nodes of the grid) 
is contaminated. In this situation, a higher number of contaminated nodes will be measured by Line 
Measurements which potentially can improve the recovery performance of the initial state. 

In order to see how the recovery performance would change as we take measurements at different 



times, we repeat the previous example for k = {0,1,2,8,50,100}. The results are shown in Fig. 4(a) 



and Fig. 4(b) for Dense and Line Measurements, respectively. In both cases, the recovery performance 
starts to improve as we take measurements at later times. However, in both measuring scenarios, the 
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Fig. 4: Signal recovery from compressive measurements of a diffusion process which has initiated from 
a sparse initial state of dimension = 100 and sparsity level S = 9. The plots show the percent 
of trials (out of 300 trials in total) with perfect recovery of the initial state xq versus the number of 
measurements M taken at observation times k = {0,1,2,8,50,100}. (a) Recovery from compressive 
Dense Measurements, (b) Recovery from compressive Line Measurements. 



recovery performance tends to decrease if we wait too long to take measurements. For example, as 
shown in Fig. 4(a)[ the recovery performance is significantly decreased at time k = 100 when Dense 
Measurements are employed. A more dramatic decrease in the recovery performance can be observed 



when Line Measurements are employed in Fig. 4(b) Again this behavior is as expected and can be 
interpreted with the diffusion phenomenon. If we wait too long to take measurements from the field of 
study (e.g., the lake of water), the effect of the initial contaminant starts to disappear in the field (due 
to diffusion) and consequently measurements at later times contain less information. In summary, one 
could conclude from these observations that taking compressive measurements of a diffusion process at 
times that are too early or too late might decrease the recovery performance. 

In another example, we fix M = 32, consider the same model for the sparse initial states with 5 = 9 
as in the previous examples, introduce white noise in the measurements with standard deviation 0.05, 
use a noise-aware version of the ii recovery algorithm |7], and plot a histogram of the recovery errors 



||a;o — a;o||2- We perform this experiment at k = 2 and k = 10. As can be seen in Fig. 5(a) at time k = 2 
the Dense Measurements have lower recovery errors (almost half) compared to the Line Measurements. 
However, if we take measurements at time A; = 10, the recovery error of both measurement processes 



tends to be similar, as depicted in Fig. 5(b) 
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Dense Measurements 




Line Measurements 
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(a) 



(b) 



Fig. 5: Signal recovery from M = 32 compressive measurements of a diffusion process which has 
initiated from a sparse initial state of dimension = 100 and sparsity level S = 9. The plots show the 
recovery error of the initial state ||e||2 = ||a;o — a^olb over 300 trials, (a) Recovery from compressive 
measurements at time k = 2. (b) Recovery from compressive measurements at time k = 10. 



Of course, it is not necessary to take all of the measurements only at one observation time. What may 
not be obvious a priori is how spreading the measurements over time may impact the initial state recovery. 
To this end, we perform the signal recovery experiments when a total of MK = 32 measurements are 
spread over K = A observation times (at each observation time we take M = 8 measurements). In 
order to see how different observation times affect the recovery performance, we repeat the experiment 
for different sample sets, We consider 10 sample sets as Q,i = {0, 1, 2, 3}, $^2 = {4, 5, 6, 7}, 0,^ = 
{8, 9, 10, 11}, = {10, 20, 30, 40}, = {20, 21, 22, 23}, n^, = {10, 30, 50, 70}, = {51, 52, 53, 54}, 
= {60,70,80,90}, ^9 = {91,92,93,94}, and = {97,98,99,100}. Figure |6(a)] illustrates the 
results. For both of the measuring scenarios, the overall recovery performance improves when we take 
measurements at later times. As mentioned earlier, however, if we wait too long to take measurements 
the recovery performance drops. For sample sets Q2 through we have perfect recovery of the initial 
state only from MK = 32 total measurements, either using Dense or Line Measurements. The overall 
recovery performance is not much different compared to, say, taking M = 32 measurements at a single 
instant and so there is no significant penalty that one pays by slightly spreading out the measurement 
collection process in time, as long as a different random measurement matrix is used at each sample 
time. We repeat the same experiment when the measurements are noisy. We introduce white noise in 
the measurements with standard deviation 0.05 and use a noise-aware version of the £1 -minimization 
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(a) (b) 



Fig. 6: Signal recovery from compressive measurements of a diffusion process which has initiated from 
a sparse initial state of dimension = 100 and sparsity level S = 9. A total of KM = 32 measurements 
are spread over K = A observation times while at each time, M = 8 measurements are taken, (a) Percent 
of trials (out of 300 trials in total) with perfect recovery of the initial state xq are shown for different 
sample sets, ^7^. (b) Recovery error of the initial state ||e||2 = ||So — 33o||2 over 300 trials for set i^^. 



problem to recover the true solution. Figure 6(b) depicts a histogram of the recovery errors ||a;o — a;o| 



when MK = 32 measurements are spread over K = 4 sample times = {10, 20, 30, 40}. 
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Appendix 

A. Proof of Theorem |J] 

We start the analysis by showing that H^na^oHl lies within a small neighborhood around ||a;o||2 for 
any 5-sparse xq G R^. To this end, we derive the following lemma. 

Lemma 2: Assume (7 = {/cq, ^i, • • • , kx-i}- Assume A has the eigendecomposition given in ^ and 
jjT e KixJV < iV) satisfies a scaled version of the RIP of order S with isometry constant 5$ as given 



in (10 1. Then, for 5s € (0,1), 



(1 - 'S) ^ E Xf:-. < < (1 + '^^) I E Af.a. + E ^r^ax (41) 
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holds for all S'-sparse xq € M^. 

Proof of Lemma If A is of the form given in (Ml, we have Axq = UiAiUj'xo + U2A2U2X0, and 



consequently, 

WAxoWl = xlUiAjU^xo + x'^UiAlU^xo > \\AiU[xo\\l > A?,niinl|f^f a;o||i. 
On the other hand. 



\Ax0f2 = xlUiAluJxo + xlU2AlU^Xfi < Xl^J\U[xo\\l + ALaxl|f^2^a;o||2 



<r \2 WTtT^ ||2 I \2 II™ ||2 
— ^l,maxll^l ''^Olb "T ^2,maxll'^0|l2- 



Thus, 



Ai,minl|f^f a;o||2 < Pa^olli < >^l,nJ\UTxo\\l + A|^,^ || aiQ || 1- (42) 



If C/f satisfies the scaled RIP, then from ^ and Q for 5^ G (0, 1), 

(1 - Ss)^XU < ¥^ < (1 + ^s)^>^U. + Ai.ax (43) 



holds for all 5-sparse xq G M^. Similarly, one can show that for i G {0, 1, . . . , K — 1}, 

\2fc. ||rrr™„||2 < ||4fc._.^||2 ^ x2fc. ||r/r_.^||2 , x2fc n ||2 
'^l,minll'^l *0||2 S \\^ Xo\\2 S Ai^^a^HUi a^olb + ^2,max II ^0 1| 2 ) 

and consequently, for 6s G (0, 1), 

(1 - '^5)|a?5^, < < (1 + 5,)^Af_ + (44) 



holds for all 5-sparse xq G M . Consequently using (24i, for (^5 G (0, 1), 

(1 - ^ E ^f..n < < (1 + '^^) I E Af.ax + E ^r^ax 

i=0 II °ll2 i=0 i=0 

holds for all S'-sparse xq G M^. ■ 
Lemma 2 provides deterministic bounds on the ratio ^^-^""jpj^^ for all S-sparse xq when UT satisfies 

I I ll-^olb 

the scaled RIP. Using this deterministic result, we can now state the proof of Theorem [3] where we show 
that a scaled version of On satisfies the RIP with high probability. 

First observe that when all matrices Ck are independent and populated with i.i.d. Gaussian random 
entries, from Corollary [3] we have the following CoM inequality for Cq. For any fixed S'-sparse xq G M^, 
let V = Anxo G R^''^. Then for any e G (0, ^), 



\\Cnv 



2I /o„,,./ MT{v)e 



^2 



>6||^||^^<2exp^ (45) 
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As can be seen, the right-hand side of (45 1 is signal dependent. However, we need a universal failure 
probability bound (that is independent of a^o) in order to prove the RIP based a CoM inequahty. Define 



inf r(^oa;o). 

5-sparse XoGS." 



(46) 



Therefore from (45 1 and (46 1, for any fixed S'-sparse xq G and for any e G (0, 



>ePf7^oi} <2exp|-^^| =2exp{-M/(e)}, (47) 



where /(e) := 



pi: 

256K 



, M := MK, and N := NK. Let v G (0, 1) denote a failure probabiUty and 



5 G (0, denote a distortion factor. Through a union bound argument and by applying Lemma |Tj for 
all (^) 5-dimensional subspaces in R^, whenever G R^^^^ satisfies the CoM inequality (47i with 

then with probability exceeding I — u, 

(1 - 5)\\Anxo\\l < WCnAnxoWl < (1 + 6)\\Anxo\\l 



(48) 



for all S'-sparse xq G M^. Consequently using the deterministic bound on ||^f2a;o||2 derived in (41 1, with 
probability exceeding 1 — v, 



K-l 



i=0 



i=0 



i=0 



for all 5-sparse xq G 



B. Proof of Corollary [7] and Corollary |2] 



We simply need to derive a lower bound on r{AnXQ) as an evaluation of p. Recall (25 1 and define 



lA'^'xoWi \\A^'xo\ 



\A'"'-'xo\\i 



G 



If all the entries of zq lie within some bounds as £e < zq (i) < £h for all i, then one can show that 

2 



r{Anxo) > K 



'h 



(49) 



Using the deterministic bound derived in (44i on H^'^'aJolU for alH G {0, 1, . . . , K — 1}, one can show 
that when A = 1 (Ai,max = Ai^min = A and A2,max = 0), = (1 - (55)^||a;o||i and 4 = {l + 6s)jf\\xQ\\l, 
and thus, 



p>K 



(1 - ^sY 
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Similarly one can show that when A < 1, 



P>K^^^i^,\'^'-^-'^>\ (50) 
(1 + OsY 



and when A > 1, 

(1 + 5s) 



/o > i^T^^^A^^^''^-^-^"^ (51) 



Using these lower bounds on p (recall that p is defined in (46 1 as the infimum of T{AnXQ) over all 
^-sparse xq € M^) in the result of Theorem [3] completes the proof. We also note that when Ai^max = 
Ai,min = A and A2,max = 0, the upper bound given in (|47|) can be used to bound the left-hand side failure 



probability even when e > -^=. In fact, we can show that (|47| holds for any e G (0, 1). The RIP results 
of Corollaries [T] and |2] follow based on this CoM inequality. We have omitted these details for the sake 
of space. 
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